In last week's episode of The Cockroach Hour, Jim Walker chatted with Cockroach Labs PMs Piyush Singh and Tommy Truongchau, along with security lead Aaron Blum, to talk about security features in CockroachDB. The full video is available here, and the transcript is below.
Jim Walker:
Hello, everybody. Good morning. Good, evening. Good, afternoon. No matter where you're at on this glorious planet today. It's beautiful and sunny here in Denver, Colorado. Just wanted to welcome everybody to this week's Cockroach Hour. Security is an important topic, I think, for all organizations. It's something that's near and dear to my heart. I started my career as a developer coding security constructs and RBAC controls and object libraries and whatnot. I love this topic. I always think of it as one of these big topics that usually causes a stir and typically something that people are very stringent about. You have to be good at it, otherwise it fails. Right? Because the weakest link in your security is only as strong as you are, right?
We spend a fair amount of time here at Cockroach Labs thinking about security and weaving it into Cockroach database all over the place. Now, that said, this is the first time actually in one of these Cockroach Hours where we're doing just a outright navel-gazing and looking inside CockroachDB and talking about security and the context of a database, and more importantly, a distributed database like Cockroach.
Thank you all for joining. That's what's on the hook for today. But real quick, before I get started, a little bit of housekeeping. There is a QA panel. Please do ask questions in there. Some of our sales engineers and some of our technical staff are in there. Sometimes we get some really, really great chat going on back and forth. Before you ask, yes, the recording will be available after the event. I know my friend, Dan, gets it up on our YouTube channel within, oh gosh, I don't want to give an SLA here, but it's pretty quick. But we will make everything available for you all.
Today's session is intermediate. I say it's intermediate to advanced, and we aren't going to get into code, but security is never simple. I don't think it'll ever be a basic conversation. People ask us, "What kind of session is this going to be?" Just straight up, we are going to get into VPC peering, we're going to get into certificates, we're going to get into data encryption and how we do that in Cockroach database. I'm sorry, we're not going to talk about geo-partitioning, latency and compliance, or code today. I'm sorry, those are the wrong bullets on the slide. My bad. But we will be giving away coffee mugs to the best questions. So please do ask your questions, and at the end of the session, I think JP will go through and choose some of the best questions, and we'll interact with you and get coffee mugs out to you. I think they're great, actually. I have two of them. They're awesome.
Jim Walker:
Without further ado, I wanted to thank my distinguished panel for joining us. Gentlemen, if you want to come on video, that'd be great. I want to see your faces. There's Aaron. There's Tommy. There's Piyush. Awesome. Thank you, each of you, for joining us. I'm honored to have this group today. This does make up a large portion of the security brain here at Cockroach Labs. There's some really intense stuff going on, especially as you secure data. Databases are not a simple thing to take care of.
I guess, if each of you could just introduce yourself, and then I think... I told everybody I love security because that's what I started coding. I love it. I think it's cool. It's a real geeky topic, but I think it's amazing actually because there's somebody always trying to break it. Right? But why is security important to you? Who you are, what your role is here at Cockroach Lab, and why is security important to you, I think would be a good ice breaker. Aaron, do you want to start us off?
Aaron Blum:
Sure. I'm the lead security engineer, so I get to touch lots of different parts of data process. That's build pipeline, that's database internals. I get to help work on the SaaS strategy to make sure that we're building something that's going to be consistently secure. I even get roped in to help customers with how to deploy this in a secure fashion and what choices they have as far as making that work.
Jim Walker:
Why is security important to you, Aaron?
Aaron Blum:
It's actually just a core passion. It's something that I've been doing for quite a while. One of the fun parts about Cockroach is you get to see that play out in ways that you wouldn't otherwise in a traditional application because of its scalability.
Jim Walker:
Piyush, I know you've been thinking about the security areas of CockroachDB for quite some time as well. Want to introduce yourself?
Piyush Singh:
Sure. I am the lead product manager for our operator experience group. We think about all of the operational aspects of your database. Security is obviously a huge part of that. In terms of why security is important to me, I just know how much our customers are trusting us to deliver a product that actually manages their data, keeps it not only alive, but also secure. This is all about delivering the thing that our customers need. That's why anyone gets into product management, right? It's all about like, "Hey, how can we get, not only get you to trust, build something that you will trust with your data." That's fundamentally why this is super important to me.
Jim Walker:
Thank you, and thanks for doing this. I think it's one of those things. We've got to work the way that they work it. If we don't secure their data, problem. You aren't going to implement a database unless it's secure, right?
Piyush Singh:
Exactly.
Jim Walker:
Like Aaron said, it's not easy. I saved the best for last. My favorite product manager in the company. Sorry, Piyush, I'll just flat out just do it. I was just going to say it out here, dude. Tommy, do you want to just give a quick introduction as well?
Tommy Truongchau:
Yeah, sure. Hi, everyone. I'm Tommy, product manager. I joined Cockroach Labs around six months ago. I got the honor to work with Piyush and Aaron on the security side. Wanted to talk about why security is important from our perspective, right?
Jim Walker:
Yeah.
Tommy Truongchau:
Cool. Alluding to what Piyush was saying, as a PM, you hear a lot about impact and how do you get the most, how do you help your customers be successful and things like that. I find that, at least in this space, security is one way to unlock a lot of adoption and to get customers to trust and have confidence in using CockroachDB and CockroachDB Dedicated to secure and protect their data, manage their data, their workflows, and to run their workloads. That's exciting, because this space is just moving into this new world, it's cloud native, it's distributed. How do we actually make that secure is an interesting challenge and it's an interesting problem. So I find it really fun.
Jim Walker:
I think it's a really good point, Tommy. Working off a couple of different things, especially what Aaron said, and you said for that matter, doing security in a database is something that's complex and not to be shorted in terms of the intense focus that has to be put into that.
It's funny, when everybody moved to the cloud, they were like, "Oh, move to the cloud because they have the best practice in security," but then the cloud actually creates issues. You know what I mean? It creates whole new things. It's funny how people get in the cloud and then they realize, "Oh, my God, I have a whole new range of things that I have to worry about."
Let's just talk about just generally as a database. Piyush, as lead product manager of this area, how do you break it down when we start thinking about the concepts of security that are important in CockroachDB?
Piyush Singh:
Sure. There's quite a few different areas that we keep in mind as we're building out our product roadmap and making improvements. A few things that I can talk about, it starts with, let's say, encrypting your data at rest in Cockroach. Encryption at rest is obviously a major feature that we offer to secure your data once it's in Cockroach. We also want to secure all of that data while it's in flight. Connecting to your database securely, connecting new nodes to your cluster and properly authenticating them to join the cluster, encrypting the traffic that's flowing in between nodes. There's also role-based access controls. I know, Jim, that's something that you mentioned you've worked on. That's something that we're building out. We have to consider things like, what's our story with compatibility with Postgres? What types of controls are we offering? Are we making privileges that are fine-grained enough that our users aren't giving way too many permissions when they need to get someone to do something in their database?
Let's see, what else? There's also encrypted backup and restore. How is the data that's in your cluster being securely stored elsewhere so that, if the worst happens, you're able to recover. There's definitely a ton of topics, and I know Aaron and Tommy probably have even more that they could mention. I'm sure I missed a few there, but those are just a few of the ones that I can speak to, at least that we've worked on recently.
Jim Walker:
I think, as you move to the cloud, Tommy, you and I have talked a lot about there's even added things that come with a SaaS implementation, right?
Tommy Truongchau:
Yeah. Right. If you're referring to CockroachDB Dedicated, our managed offering, that's something we're trying to push along, get it out to market, and help make that easy for customers out there. We talk about things like... Oh, sorry, Jim. You were going to say something.
Jim Walker:
No, go on. No, no, you were just going there.
Tommy Truongchau:
No, I was going to talk about, for example, one of the new things we are including in CockroachDB Dedicated is, how do you securely connect to your clusters? We have new offerings around VPC peering for GCP clusters. Down the line, we're going to do AWS PrivateLink to connect to your AWS apps as well. These are a few examples of things that we're pushing the needle on.
Jim Walker:
We'll talk about all those today. I think it's going to be really interesting, y'all, when we get to multi-tenant as well, seeing a multi-tenant database deployed in the cloud, serverless. What does the world of security mean in this circle, serverless?
Aaron, I want to actually turn it to you because I think your background in terms of your deep knowledge of security, you've been in the security game for really quite some time. What we were just talking about, the concept of distributed systems adds a ripple to the challenge of what it means to be a secure database. What do you feel like the biggest challenge is that distributed systems adds to this security stuff?
Aaron Blum:
I think the main thing with a distributed system is you want it to be available and you want to make sure everything can communicate as it needs to. Often, security is making sure that unauthorized parties are not communicating or accessing things, so there's a fundamental push and pull there. I think our solution to that has been quite good in that we've worked to make sure the nodes have very strong trust between them. We have MTLS. That means that not only is the node sure that it's communicating with other nodes that are part of that trusted hierarchy, but itself can actually use that same trust anchor to validate itself to those nodes. It's a web of trust that's just as resilient as the underlying data structures.
Jim Walker:
Yeah. It's almost like you got to think about it in a distributed department, it's a web. It's not synchronous communication between two endpoints. What you're talking about is a many to many relationship. I've been in the Kubernetes space too for quite a while, Aaron. It seems like security gets in the way, and it is people, right? What do you feel like? How do you make it simple? What do you do? I only ask because, look, yeah, we're talking about Cockroach, but I think there's a lot of distributed systems people on the phone as well. Right? What's the best practice to think about that, right?
Aaron Blum:
Today we're looking at a couple of ways of doing it for Kubernetes. Internal to CockroachDB Dedicated, we orchestrate all the certificate management and all the trust primitives. We have our own system that handles that, but realize that might not be satisfying. We do have ways for you to do it. You can treat them all as independent nodes and then store things within Kubernetes Secrets. Actually, last night, I spent a fair bit of time working back and forth with Ben trying to refine where we're going on. We're looking at an approach that will allow nodes to behave as first tenants within the Kubernetes sets, so they'll come up, you'll pass them in an initialization and just say, "Hey, you're a CockroachDB node set." They'll be able to pair with each other, share a secret, and then come online. Any resources like user CAs and things that you want to anchor to will be provisioned to them, and it'll figure out the rest as it goes. You have to orchestrate it to you turn it on and it pretty much just works and behaves in an internally consistent and secure fashion.
Jim Walker:
Is that done natively in Cockroach, or is that something that's been... like we use an operator to do that. What are you thinking? How does that get implemented? Again, we're doing it for a database. I hope there's people on the phone who are doing it for whatever application they're building it, and there's a best practice pattern. Right?
Aaron Blum:
We're trying to put as much of that into the application layer as possible to relieve the load from Kubernetes because anything that you've put into an operator starts to become a little bit more unusual or you have steps, and you wind up with, especially in a distributed system, the issue of, well, who comes up first and who initializes things if you have dozens or even thousands of nodes coming up? Who's in charge, and why do you trust that node, and what happens if that node fails? We really wanted to take an approach where the database itself has a resiliency to build the security structures in a similar fashion as the way it builds the other distributed components.
Jim Walker:
I think some of the same concepts we use within the database itself in terms of how it actually communicates. I mean, we're using things like gossip and that sort of stuff. I mean, it's basically we're just living on top of the already-great communication we're doing between those, correct?
Aaron Blum:
Yeah, absolutely. That's actually one of the things we're going to do for provisioning or we're looking at doing where after the node is established that they all belong to the same group, they'll be able to share all the rest of the configuration information underneath secure comms patterns using the underlying data structs.
Jim Walker:
All right, so I think the Kubernetes thing is going to be interesting. I think one of the other challenges within Kubernetes, and Tommy has touched on some of the stuff you were talking about as well and certificates and all this stuff. How do we deal with certificate management with Kubernetes and CockroachDB? How do you see customers doing this? I don't know. Aaron, you were out there, Piyush, I don't know. Who wants to jump in and answer that?
Aaron Blum:
I can take that one.
Today, we support really robust security controls around certs. You can use different trust anchors or CAs for the nodes communicating with each other. You can use different ones for authenticating users coming into the system, and you can also put your own certificates or allow the system to use self-signed ones for communicating to such ports. Depending on your deployment strategy, you may need to use an external CA to mince certificates for all your internal services. That's fine. We support that. If you're just doing a dev environment and you need to mince your own certs, you can that too, so it provides that sort of granular access control.
Jim Walker:
Right. But I mean, typically, these things are involved. It's not simple to actually set up, get configured, deal with it. What have we done to basically simplify that too, or are we just going down that path? I know we work with that stuff. I mean, I don't think we would've been able to survive without working with it. What have we done to simplify it well?
Aaron Blum:
Today, we are pretty well-supported if you have the ability to orchestrate your own certificate story. Getting started is a little bit more friction, and we're actively working on improving that. Again, going back to the Kubernetes thing we talked about before where we're looking at being able to bring the nodes up and have them automatically communicate and gossip the other trust primitives, we're looking at doing the same thing for manual or scripted deployed solutions. The notes would actually be able to generate an internode communication web with their own CA. It would not be externally exposed. It's opaque, so the notes will be using strong TLS, which the user doesn't have to manage at all anymore. For everything else, if you want to set a certificate that you've signed or that you trust, you can do that. That's just a matter of putting the right certificates with the right names in the right config files or directories.
Jim Walker:
Right. How does that work, Aaron? I'm sorry to... I'm kind of intrigued by this actually. I'm trying to ask the question to get into it a little bit deeper. If you think about a TLS connection, there's some sort of certificate exchange between two nodes that actually have that conversation. How does that work? I mean, is it a public key that then we use this metric key to do that? I guess I'm asking how TLS works a little bit.
Aaron Blum:
No, not at all. As it works today, each node gets a certificate to its own host name that's issued by a common CA. The public key for that CA is also installed in the certificates directory for each node, so the nodes can validate each other's certificates and represent themselves with signed certificates to be able to establish mTLS and mutually authenticate across the web.
Jim Walker:
Right. So then basically we're using public private key infrastructure which has been around for a long time and proven to establish secure communication between the nodes. I guess that ticks the first part off, secure data in motion. It wasn't that one of your key things, Piyush, right? Is there anything else we have to do to secure data in motion?
Aaron Blum:
I've got the auth picture. We need to make sure that we're talking to the right people.
Jim Walker:
Yeah. Yeah.
Piyush Singh:
Exactly. That's actually like a pretty large area. It's deceptively large because there's all sorts of external integrations that people want supported. There's authentication and authorizations that every user will have when you're talking to your database. There's existing systems, like Active Directory and Kerberos and all of these other tools that people want to use because they centralize authentication throughout their entire company, especially large companies, large enterprises will want to have a single central place where they manage all the permissions for their users, and if someone joins or leaves the company, they just have to work in that one place instead of going through and provisioning accounts for them and every single internal service that they provide.
That's something that we're actually working on. We do, for example, support Kerberos integrations for authentication. We're kind of laying the groundwork, doing a lot of the security work to enable authorizations as well, authorizations meaning what privileges users actually have inside of the database. I mean, that's on the database side. We also have this wonderful packaged avenue that ships with our database, and there, we're working on things like single sign-on that's actually coming in our 20.2 release.
Piyush Singh:
Again, it's that story of you don't want to have to create a username and password for every single user that's accessing your database. You just want to use some central OAuth provider to allow people to access their Admin UI and see what's happening inside of your cluster, so properly authenticating and authorizing users is definitely a huge area that security covers like Cockroach.
Jim Walker:
There's other types of connections though. Again, well, we'll just round out this whole section with VPC peering. Tommy, we added VPC peering over the summer I think with AWS, now with VPC, so just explain to me what it is and what's the state of that project now?
Tommy Truongchau:
Yeah. Yeah, for sure. What we tend to find is that a lot of our customers, they tend to run their apps in their own virtual private networks, and one of the challenges that we had with Cockroach cloud in the beginning was the fact that how do you connect to those other VPCs in your cloud provider? We didn't really have a way to do that. What we ended up seeing were a lot of customers allowlisting the public internet to allow traffic between Cockroach's clusters and their applications. That works. Seamless UX, you can say, but it's not actually secure from their perspective. It's not actually desirable from what they need.
We heard this come up a lot and a lot from all of our CockroachDB Dedicated customers. Over the summer, we were able to bake in support for VPC peering over GCP. You do the same thing with AWS clusters, but doing it using their PrivateLink end point stack. That's currently in a process right now. Actually, just today, we enabled VPC peering for all Cockroach cloud customers. That's available for all folks using Cockroach cloud today. The private link, self-service UI, that's going to be coming down later this year, but we're excited to get that out soon for people.
Jim Walker:
Awesome. Congratulations. I know that was a bit of a labor of love to get that out there, but it's basically, it's just configurable via the Admin UI in Cockroach cloud now, correct?
Tommy Truongchau:
I'm sorry?
Jim Walker:
Is it configurable just in the Admin UI now in CockroachDB Dedicated, right?
Tommy Truongchau:
Yeah, VPC Peering is available in the CockroachDB Dedicated admin UI.
Jim Walker:
So-
Piyush Singh:
Really, the big point to hit there too is this is kind of a challenge that people have, especially with modern cloud deployments where IP white listing is, or IP allowlisting I should say is challenging because you are no longer running your application on something that has a fixed IP address. If you're orchestrating your application, you're killing pods, bringing them up based on the volume of requests that are coming into your application, you're going to have new IP addresses coming in and out of existence, and trying to connect to your Cockroach cluster is like how do you know which ones you should allow to connect to your cluster? It's kind of that new modern infrastructure that's kind of driving this need for peering, which I think is kind of interesting to see.
Jim Walker:
Yeah. Yeah. I think it's a unique challenge. Well, it's actually not even a challenge for us as a database. I think lots of applications are dealing with this and this complexity, and this is kind of one of those new things like, yeah, moving the cloud, and all of a sudden, I get there, and what else... What? I've got a bunch of other stuff that's taken care of now, so it's a pretty good example of that.
I actually want to go back a little bit, Piyush, there was a question in the chat about audit as well, so there's the whole triple A of security, right? There's authentication, authorization. Is it auditability, accounting, whatever, right? When we talk about RBAC and role-based access control, we talked a little bit about authenticating user, but the authorization part of that thing, we put that into CockroachDB Core in the spring. What led you to that decision here because I like it when features get into our... We do run an open core business model, so what got you there?
Piyush Singh:
For sure. Yeah, and to set some context, we kind of have a rough rule of thumb that we follow for which features we think should fall into the open core part of our product versus the commercially licensed part of our product, which is do we think it's useful to startups or do we think it's something that's more useful to sophisticated enterprise customers? Historically, we kind of thought that role-based access controls were more of the letter, like you need very sophisticated access controls when you have an organization that's like hundreds of thousands of people, but I think what we ended up finding was actually there's a huge Postgres compatibility story here.
Actually, the way Postgres treats roles and users, it actually treats them interchangeably. In order to support that concept of making users and roles interchangeable to mash Postgres syntax to support external tools that rely on Postgres syntax and different application frameworks, we actually ended up realizing, "Okay, well, we have to essentially make this role-based access control stuff fall into the open core part of our product, because, otherwise, it's just going to break a lot of these integrations." In order to make the user experience better for basically everyone who's connecting into the database, we decided it just made more sense to put this into the open core part of the product.
Jim Walker:
Honestly, I think security is a baseline. You can't build an app without some sort of foundational security. I don't care if you're building I don't care if you're building a simple birthday app. You know what I mean?
Piyush Singh:
True.
Jim Walker:
Maybe it's just because I've been in the security space for so long, at least tangentially related. I'm no Aaron, but I at least care about this thing. I think that, making it a part of our core product was something that we talked about at the last release, and I don't think it was very well understood. It's like, "It's not just for the enterprise, it's for every single company."
I want to shift a little bit from the data in motion, which I think that was where we're at. Let's talk a little bit about data at rest, as well. There's two sides of this, right? So, encrypting data at rest, and then there's the backup, and restore. So let's talk about the encryption at rest. How does that work today,? I know we can do that, at what level can we do that? Who wants to pick that one off? Aaron? I see Aaron going for the mute button.
Aaron Blum:
I had the mute button already off. I can't speak too deeply about this. I know that we actually just shifted our storage engine. We went from RocksDB to Pebble, and I remember the PR for that, because we basically evicted a bunch of the old C++ code that enabled RocksDB to do encryption at rest, and replaced it with our own pebble engine. I can't go deep into the weeds, what I can say though, is all the nodes, if configured to use encryption at rest, will write encrypted data, and only encrypted data, to the data stores on disk.
Jim Walker:
And it's all configurable, right? I mean, at what layer can you actually configure encryption at rest? Is it the whole database? Is it a table? Is it a row? Is it a column?
Piyush Singh:
As far as I know, it's at the entire cluster level today. We've heard requests for things like row level, or column level security, and controls around that, and that's definitely something that we're looking at long-term, but it's definitely the whole cluster today.
Jim Walker:
What about when we do integrations with change data capture (CDC)? Is it something that CDC capabilities that we have that are encrypting, or is it basically whatever we're feeding into CDC, is encrypted?
Piyush Singh:
That's a great question, actually. Aaron, do you know, I know we've done a little bit of work on that recently, I don't know if you're familiar?
Aaron Blum:
I don't know the current state, and I don't want to misrepresent it.
Jim Walker:
Fair enough. I think there was a question in the chat about recovery and master keys. Let's talk about backup and restore. Backup and restore on distributed systems is not simple, and then encrypted backup and restore also nuts, as it's compounding. Compounding complexities.
Piyush Singh:
I know one thing that we've been thinking about recently is how we can integrate with external secret managers. You probably don't want to be in the business of managing the complexity of dealing with all of these keys, and rotating, and doing all of that by hand, or scripting it. So, there's just external tools that are built to handle all of that stuff. Now it's on Cockroach, like, "Okay, we see our customers are using these things, which is the security best practice. How can we support them to make sure that they're successful?"
Piyush Singh:
So actually, we have started laying the groundwork for support of AWS KMS, right? So that all of the security backup stuff happens without you having to get way into the weeds of it, and just making it way more hands-off, and easier to use. That's the thing that's top of mind right now. Obviously we do support encrypted backups, and restoring from those backups, so I think that was something we added relatively recently. I don't remember the exact release off the top of my head.
Jim Walker:
We have an enterprise feature, but that's not incorporated, because we're moving some of our backup restore features into Core, this release, correct?
Piyush Singh:
That is correct. I don't think that is part of the piece that's moving to Core.
But that is something that's super exciting for the 20.2 release too. We've definitely heard a ton of requests for like, "Hey, can we get the distributed backup into the Core part of their product?" So, we're really excited to be making that change.
Jim Walker:
And distributed backup, it's one of these things that I guess I didn't really get it. I thought we just did it naturally, but it's actually not simple to do. Because in our database, we can actually do something called geo partitioning, which is tied into locations. If you're going to do a backup, and you have some sort of policy about, that you're meeting some compliance, or regulation thing because of GDPR in Germany. Whatever that is, customer data, it needs to reside in certain places. If you just did a backup and restore the entire cluster, and that went to one central repository, you've just violated all that policy.
Piyush Singh:
Yep. The trick is restoring as well, right?
Jim Walker:
Exactly.
Piyush Singh:
Like, "Hey, if something happens, I need to also restore this in such a way that the data never leaves the specific geographic region." That's super tricky.
Jim Walker:
It's just not simple. It's one of these concepts of the distributed systems where you just think, "Oh, wait, oh, wait. That's actually..." and then you get into it.
Aaron Blum:
Yeah. That was actually the biggest snag we hit when we first started exploring the AWS KMS, because we're like, "Okay, well we'll just generate keys. Oh, wait, we're going to need different keys for the different regions, because we don't want one reason to be able to decrypt the backups. Okay, we're going to need to send these to different regions, and they're going need to have different keys, but we keep a master key for that?" It was a really interesting problem, and I don't know where it currently is, but watch this space, because we're definitely working on making sure that we can preserve the integrity of that data, as well as the privacy.
Jim Walker:
Yeah. To me, it's just one of these things you guys had, like an even distributed system. Everything old is new again, I guess, eventually. We have solved these things in Postgres, and MySQL, and Oracle, and everybody doing it distributed, it just adds a layer of complexity. I think it's actually a pretty important point. One of the other questions that came up, and why we're still talking about encryption, "When will you be able to mask data in Cockroach?" I don't think we can do it today, correct me if I'm wrong, but is that something that's on the roadmap, Piyush?
Piyush Singh:
It's something we've heard requests for. We're definitely looking at it, but we don't have any immediate plans to support data massive yet. So, we're still in the early requirements gathering phase for that.
Jim Walker:
I think right now, Aaron, what we're seeing is organizations, they do it around the deployment architecture, what they like, and it's basically the entirety of the database, correct?
Aaron Blum:
Yeah, for now.
Jim Walker:
So, thank you, that's really good guys. So, there was another question, and actually it was something that I wanted to talk about as well. You usually talk about integrating with authorization frameworks, and these things. There's also learning and monitoring when it comes to security, and there's a whole suite of tools out there that allow you to understand what's going on. I mean, from Splunk, to, I don't know, somebody in the observability space? What does Cockroach implement on that side of the world? Let's just start with alerting, we'll come back to logging.
Piyush Singh:
Okay. Yeah, we're actually starting to build out some partnerships in that, in that space. So, we are looking at what external vendors we want to support. In terms of pure monitoring, obviously we're starting to build out metric based monitoring support, with companies like Elastics Kibana, Datadog. So, they will be able to scrape our API endpoint--just metrics--and then we can configure alerts on top of that. In terms of security alerting, I would expect most of that to be through logging.
So, we are looking at companies like Splunk. How can we better integrate with them, how can we--this is starting to get into logs--how can we format our logs so that they're easy to adjust there?
Jim Walker:
It's all related, right?
Piyush Singh:
It is, it is. Yeah, it's that question of like, "Okay, how can we feed these tools with the proper data?" So, to that end, we're starting to generate these trails, these audit logs of all the different actions users are taking in the database.
Things like, "When are users provisioning new accounts? When are they granting those accounts permissions? When are people connecting and authenticating to the database? When are they connecting and failing to authenticate?" All of those events, we actually track and store into audit logs, or we have an event log that tracks changes, things like that. Then ideally, you would feed that into some external tool, which would then be able to monitor out on top of these events.
Jim Walker:
Yeah. Then, doing it in real time, Piyush, is this something that's scraping the Prometheus endpoint? I mean, I think that's what people are also interested in. So, this is the real-time thing, too. How does that work right now?
Piyush Singh:
Yeah. So, this is veering into intrusion detection systems, and I know this is something Aaron is actually very passionate about, so I will actually kick this over to him.
Jim Walker:
Go on, Aaron.
Aaron Blum:
So, today we have a number of logs, and you can configure what goes into the audit logging. We found that, that's fine for customers that know exactly what they want to audit, but we're actually working on aligning things to a security specific log sync, so that you can have a security log that's admitted and monitored directly. Then you'll get that at the right time, so the log will continue to just spill to disk today. We're looking at ways to actually feed that to a network sync, or something else, but once that's done, you'll have a fairly high fidelity feed of all the security events from the cluster.
Jim Walker:
Do all databases do it like this, Aaron?
Aaron Blum:
Not in my experience. It's a grab bag, depending on who wrote what, where. Whether you're going to get a pure security log, or whether they're going to have to go pick through a lot of other noisy events and try to isolate the things that actually matter to you. So, we're trying to make it very, very digestible.
Jim Walker:
That's cool, I think I've gotten through all the questions that we had here. Then, I think we actually talked about the last one, too, Tommy. We've talked a fair amount about, "What have we learned in CockroachDB Dedicated," but beyond VPC peering, is there anything else that we've learned in deploying Cockroach as a service ourselves, that might be a best practice for people to think about as they deploy their distributed applications? I don't know, Tommy or Aaron?
Tommy Truongchau:
I'm noodling through right now, but Aaron, if you want to jump in, if you're free to.
Aaron Blum:
All right, go for it.
Tommy Truongchau:
Yeah. I mean, I guess I would say if you have VPCs try to connect with that. We highly discourage you from connecting your traffic through open internet. But, on that topic, there is the conversation of, there's a free tier of Cockroach Cloud that's coming up, that's being worked on by the team. And one of our requirements there is, let's simplify that UX. If CockroachDB Dedicated customers aren't able to connect to their VPCs right now, but we want to remove that ability for them to... Or to require them to allow this IPs in order to connect, how can we improve the security posture of CockroachDB Dedicated, by default, to help them connect there entirely? So there are things that are happening behind the scenes to enable that. I don't know, Aaron, if you wanted to hone in on some of the details there, if we're allowed to talk about it.
Aaron Blum:
Well, I'll talk about it from a logging standpoint, because that's actually one of the places that really highlighted... As I looked at trying to secure CockroachDB Dedicated and make sure that we were protecting our customer's data, I realized that picking through all of the omitted log messages was very, very high volume. And anybody that's worked with an enterprise team knows that they tend to charge you either by volume or by lines. And I didn't want to adjust all the lines from the database. I want to adjust the security events.
So, going back to that, I started getting requirements around, and development work around, getting just that security feed, which we can then build models of what's normal and what's not. And so, instead of forcing customers to use an allow list, or asking them to take specific VPC actions, we're going to put controls in place that would allow us to identify bad actors that are probing the infrastructure and reject them or limit them, and allow customers to continue to interact, without feeling the pain of this additional secure controls.
Jim Walker:
Don't get in their way. Just make it work, right?
Aaron Blum:
That's right. It should just work.
Jim Walker:
It should just work. And I think that's the trick with security. When it just works, nobody even notices it. It goes underappreciated, but it's actually really awesome. So kudos to you in the team, Aaron, because I think it's some really kind of elegant work beyond that. And you had referenced Ben earlier, Ben Darnell, one of our founders, probably one of the single best engineers I've ever met in my life. He's brilliant. And so these are not easy things to solve and to do it kind of elegantly. So what other questions? So do we have any plans to integrate with Linux security groups and users, Piyush?
Tommy Truongchau:
That's a good question. I actually haven't heard that request to date. So we don't have any plans yet, but actually, if you're willing to file a Github issue, I would happily follow up there.
Jim Walker:
That's right. And we are open. And then, I guess, we use asymmetric encryption. What does that mean? Who wants to take that?
Aaron Blum:
I'll talk a shot at answering this one. I don't know the exact context that's being quoted from, but all the PKI work that we do is asymmetric crypto. So if you want to authenticate to the cluster using certificates, you're not sharing a symmetric key that can be compromised in both places. It's, you have one end and that authenticates you, you have your private key and then through a standard key exchange, you'll establish secure communications using TLS.
Jim Walker:
Yep. And I think, just looking up how PK infrastructure works, helps people understand how asymmetric encryption works. I think it's truly awesome. So I'll go way back, you guys. Again, I got to give a little shout out to my history, man. We had implemented SHA1 MD5 and embedded 128 bit symmetric key into the bios of the machine, all within about 25K of code. So that you can actually take that symmetric key, which you were guaranteed what that was, and actually implement public key infrastructure from the bios up into the OS, which I thought was pretty damn cool, but that was some, again, remarkable engineering, but that was symmetric, but it allowed us to do asymmetric eventually. There's some really cool stuff and a lot of good reading out there, for the person who asked the question. This is a wonderful topic to go down. Because it's super interesting and really, really cool.
Well y'all, I think we covered everything. We covered all the concepts that you should kind of outline at the beginning. I think we hit all the questions. There was one question that was like, "Can we install the admin UI separate from our cloud cluster or whatever that is? I think the person who asked this question, Dietrich, the Admin UI with every node of Cockroach. Every node of Cockroach is one atomic unit. There is no different types of nodes. There's no admin node, a storage node, and a transaction node. A node is a node is a node. It comes with all of this security, it comes with all of our UI, and it comes with all on the CLI, it comes with all of the CD...
Jim Walker:
The binary is the binary, you can connect to any node, and you can go back to the admin UI, which is just awesome. And talk about key concepts in distributed systems. And Beam, living up to those primitives, that single atomic unit, being the full context of our software, single binary, is really what allows us to scale very easily, just at the drop of a hat, and I pointed out the cluster, as long as you have a TLS connection, you're good.
Aaron Blum:
That was actually one of the challenges on the Kubernetes side, because since all the nodes are the same, it's not like you bring up a master and then you bring up a bunch of auxiliary nodes. It's like, all the notes come up, they're all the same. There's nothing to differentiate them, which when you're trying to establish the trust primitives that build the entire Kubernetes set, how do you pick one? Or do you pick one?
Jim Walker:
We solved that, Aaron, right?So I think that's one of those things. And again, it comes back to these, as you're building out distributed systems, as you're going down these paths trying to figure that out in your own applications, this is one of things about being an open source company, and contributing back to the community. And in putting our code into the core product, where people can actually go and investigate this stuff and see that's practice, which to me, I get these conversations about open source all the time, like, "License, license..." Well, there's also a whole bunch of code and incredible software engineering that's out there. And I think this is one of those areas that, this is Cockroach giving back to the world as well. And some great minds in our team. And again, now there's always going to the doc. So with that, let's see, there's... Let's see here. There's one more question: Is it possible to disable the root user inside Cockroach and have different admins connect to the cluster?
Aaron Blum:
I can take that one.
Jim Walker:
Yeah, can you Aaron? That'd be great. So just paraphrase the question really quick. Yeah.
Aaron Blum:
Yeah. So CockroachDB has a special root user that is used for doing certain cluster administration events. It is a very special user, and by default, does not have a password set. You can only use a certificate to authenticate a set. It's basically a very special maintenance account. At some point in the future, I would like to see that disappear and become a named admin account that can be anything that you want, but today, it is a special account. We do not recommend using it for anything outside of cluster administration. And we have a new slew of permissions. I think landing in 20.2, they'll actually make it easier to administer clusters with named users that are not that special user.
Tommy Truongchau:
It's also worth mentioning that we do have a dedicated advocate role that does give you a select slew of admin privileges, so you can grant the admin role to a user that you want to run a lot of these administrative commands, and then not have to use the user.
Jim Walker:
And all pretty well-documented. Yeah. Very, very well documented. Jesse on the docs team, again, just does a great job.
Aaron Blum:
I think you can also potentially put a line in for the host based off to prevent remote access to the root roll. So you would have to be local to the device with the certificate to be able to authenticate.
Jim Walker:
One last thing, I guess, Aaron, you talked a little bit about this before. Are there different levels of auditing within Cockroach? I think we're kind of building that in now, right?
Aaron Blum:
Absolutely. So you have everything from the security auditing that we were talking about earlier, all the way down to every query and every transaction is audited, so depending on your level of granularity. Your mileage might vary if you turn on audit everything that the database does, because that's going to be a lot of IO, but it's available to you and it's tunable.
Jim Walker:
Okay. And then actually, this is Alex Robinson's shout out: If you're running two clusters in OpenShift and you wanted to connect them, how do you do that? This just shipped out OpenShift for Kubernetes, right? Yeah, multiple confederated clusters, right, Aaron? This is not a simple thing to do. And how do you connect nodes within Cockroach, which is actually a different layer, right? We're not talking about a federated cluster, we're essentially federating data now at the database layer. How does that work? I know we have a pretty good video about this I think Alex Robinson had done for us a while ago. Do you know how that works, Aaron?
Aaron Blum:
No, I can't speak to that.
Tommy Truongchau:
I will say, one super nice thing is actually, we just had a Kubernetes operator that was OpenShift certified, that will be available on... Actually, I think is available now on the Red Hat marketplace. And so one super simple answer is, user operator.
Jim Walker:
Well, yeah, user operator to deploy in OpenShift. And I think that the complexities of doing multi-region applications on top of Kubernetes is not simple. We're doing it in our SRE team today. So I think the easiest way to get that done is to actually go use CockroachDB Dedicated.
All right, you guys, so listen, thank you very much, all three of you, for taking the time today. That was really, really helpful. Actually, I learned a fair amount today. I miss our lunches together, you guys. I miss sitting around and having these conversations, because you learn more about the company that way than in... But great work, and thank you all for joining. And thank everybody for joining us today. And there was a lot of really good questions. We really appreciate it. We hope it was helpful for everybody. Again, the recording will be up and available. There is a survey after this event as well that JP will put in front of everybody. Please do complete that. It really helps us get better, and any and all of that feedback is just really wonderful.
Aaron Blum:
Bye Jim.
Jim Walker:
Tommy, see you Tommy. Piyush, we'll see you later, buddy.
Tommy Truongchau:
See you Jim.
Jim Walker:
Thanks everybody, and have a great day.
Production deployments are a world apart from development and testing environments. They come with their own …
Read more
In last week's episode of The Cockroach Hour, Jim Walker chatted with Red Hat principal product manager Scott …
Read more
The first version of CockroachDB Dedicated, our database-as-a-service product, had our users fill out a Google doc …
Read more